Summary¶
Baseline LSTM model which uses CAISO SP15 pricing and load data to predict real-time market price for the next day. It takes data between 01/01/2023 - 03/01/2025 and predicts the rest of 2025.
Model inputs are:
- day-ahead market price
- day-ahead load forecast
- prior day real-time price
- day, hour, night/day, year indicators
Output:
- real-time price deviation from day-ahead price for the entirety of the next 24 hour block
Baseline hyperparameters¶
hidden_size: 64
learning_rate: 0.001
batch_size: 16
dropout: 0.2
CRPS = 6.44
Comments¶
With relatively little tuning this LSTM model outputs reasonable prediction results given the narrow scope of the training data.
Coverage is slightly biased, missing lower range and upper range values more often. But it captures the general daily trend well and provides uncertainty bounds which mostly correspond to reality.
It also completely fails to account for large price spikes which happen too infrequently for the model to accurately predict. It's likely that a second model would need to be trained to specifically identify periods of high price spike probability. This model would need more data including additional nodes for spike examples, and other types of data like generation mix and weather forecasts. That's considered out of scope for this project but its implementation would be relatively straightforward in the project's modular structure.
from datetime import datetime
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import plotly
import plotly.colors as pc
import plotly.graph_objects as go
import yaml
from price_forecasting.config import MODELS_DIR, PROCESSED_DATA_DIR
from price_forecasting.utils.scoring_tools import get_mean_crps
Loading Data¶
MODEL_DIR = MODELS_DIR / 'LSTM_v1'
# load config file and variables from model run
with open(MODEL_DIR / 'config.yaml', 'r') as f:
config = yaml.safe_load(f)
DATA_SOURCE = PROCESSED_DATA_DIR / config['data_source']
quantiles = np.array(config['quantiles'])
# load X data in DataFrame format
X_test = pd.read_parquet(DATA_SOURCE / 'X_test.pqt')
# load y_test data and format as numpy array
y_test = pd.read_parquet(DATA_SOURCE / 'y_test.pqt').to_numpy()
y_test = y_test.reshape([-1])
# load y predictions from model
y_pred = np.load(MODEL_DIR / 'y_pred.npy')
dam = X_test['DAM_PRC'] #day ahead price data
time = X_test.index
time = time.tz_convert('US/Pacific')
Plotting¶
plotly.offline.init_notebook_mode()
def quantile_plot():
fig = go.Figure()
n = len(quantiles)
colors = pc.sample_colorscale('Viridis', [i/n for i in range(n)])
for i, yi in enumerate(y_pred.T):
if i == 0:
fill = None
else:
fill = 'tonexty'
fig.add_trace(go.Scatter(x=time, y=yi + dam, mode='lines', name=quantiles[i],
line=dict(width=1,color=colors[i]), fill=fill))
fig.add_trace(go.Scatter(x=time, y=y_test + dam, mode='lines', name='RTM Price',
line=dict(width=2, color='black', dash='dot')))
start = datetime.fromisoformat('2025-03-01 00:00:00')
end = datetime.fromisoformat('2025-03-08 00:00:00')
fig.update_layout(title='SP15 RTM Price Prediction',
xaxis_title='Time',
width=1100,
height=500,
yaxis_title='Price ($/MWh)',
xaxis=dict(range=[start, end]),
yaxis=dict(range=[-100, 250]),
)
fig.show()
quantile_plot()